Reducing SMT Rule Table with Monolingual Key Phrase
نویسندگان
چکیده
This paper presents an effective approach to discard most entries of the rule table for statistical machine translation. The rule table is filtered by monolingual key phrases, which are extracted from source text using a technique based on term extraction. Experiments show that 78% of the rule table is reduced without worsening translation performance. In most cases, our approach results in measurable improvements in BLEU score.
منابع مشابه
Improving Statistical Machine Translation with Monolingual Collocation
This paper proposes to use monolingual collocations to improve Statistical Machine Translation (SMT). We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. The experimental results show that our method improves the performance of...
متن کاملFUN-NRC: Paraphrase-augmented Phrase-based SMT Systems for NTCIR-10 PatentMT
This paper describes FUN-NRC group’s machine translation systems that participated in the NTCIR-10 PatentMT task. The central motivation of this participation was to clarify the potential of automatically compiled collections of sub-sentential paraphrases. Our systems were built using our baseline phrase-based SMT system by augmenting its phrase table with novel translation pairs generated by c...
متن کاملStatistical Machine Translation Support Improves Human Adjective Translation
In this paper we present a study in computer-assisted translation, investigating whether nonprofessional translators can profit directly from automatically constructed bilingual phrase pairs. Our support is based on state-of-the-art statistical machine translation (smt), consisting of a phrase table that is generated from large parallel corpora, and a large monolingual language model. In our ex...
متن کاملLIUM SMT Machine Translation System for WMT 2010
This paper describes the development of French–English and English–French machine translation systems for the 2010 WMT shared task evaluation. These systems were standard phrase-based statistical systems based on the Moses decoder, trained on the provided data only. Most of our efforts were devoted to the choice and extraction of bilingual data used for training. We filtered out some bilingual ...
متن کاملChained System: A Linear Combination of Different Types of Statistical Machine Translation Systems
The paper explores a way to learn post-editing fixes of raw MT outputs automatically by combining two different types of statistical machine translation (SMT) systems in a linear fashion. Our proposed system (which we call a chained system) consists of two SMT systems: (i) a syntax-based SMT system and (ii) a phrase-based SMT system (Koehn, 2004). We first translate source sentences of the bite...
متن کامل